PAC Associative Reinforcement Learning
نویسنده
چکیده
General algorithms for the reinforcement learning problem typically learn policies in the form of a table that directly maps the states of the environment into actions. When the state-space is large these methods become impractical. One approach to increase e ciency is to restrict the class of policies by considering only policies that can be described using some xed representation. This paper pursues this approach and analyzes the associative reinforcement learning problem in the PAC learning framework. As a representation, we use a general form of decision lists that can describe a wide variety of restricted classes of policies. We then describe an algorithm that provably learns, with high probability, a good approximation of the optimal policy for any environment that satis es a particular adequacy condition stipulated for the representation used. The running time of the algorithm is polynomial in the size of the representation in addition to the usual parameters for a PAC algorithm. We give some experimental results that show that the algorithm performs well in practice. 1 Supported in part by the NSF grant CCR-9202158 and by an Andrew Mellon predoctoral fellowship at the University of Pittsburgh.
منابع مشابه
On Polynomial Time PAC Reinforcement Learning with Rich Observations
We study the computational tractability of provably sample-efficient (PAC) reinforcement learning in episodic environments with high-dimensional observations. We present new sample efficient algorithms for environments with deterministic hidden state dynamics but stochastic rich observations. These methods represent computationally efficient alternatives to prior algorithms that rely on enumera...
متن کاملPAC-Bayesian Policy Evaluation for Reinforcement Learning
Bayesian priors offer a compact yet general means of incorporating domain knowledge into many learning tasks. The correctness of the Bayesian analysis and inference, however, largely depends on accuracy and correctness of these priors. PAC-Bayesian methods overcome this problem by providing bounds that hold regardless of the correctness of the prior distribution. This paper introduces the first...
متن کاملLower PAC bound on Upper Confidence Bound-based Q-learning with examples
Abstract Recently, there has been significant progress in understanding reinforcement learning in Markov decision processes (MDP). We focus on improving Q-learning and analyze its sample complexity. We investigate the performance of tabular Q-learning, Approximate Q-learning and UCB-based Q-learning. We also derive a lower PAC bound Ω( |S| |A| 2 ln |A| δ ) of UCB-based Q-learning. Two tasks, Ca...
متن کاملReinforcement Learning in Finite MDPs: PAC Analysis Reinforcement Learning in Finite MDPs: PAC Analysis
We study the problem of learning near-optimal behavior in finite Markov Decision Processes (MDPs) with a polynomial number of samples. These “PAC-MDP” algorithms include the well-known E and R-MAX algorithms as well as the more recent Delayed Q-learning algorithm. We summarize the current state-of-the-art by presenting bounds for the problem in a unified theoretical framework. We also present a...
متن کاملUBEV - A More Practical Algorithm for Episodic RL with Near-Optimal PAC and Regret Guarantees
Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform vers...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995